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Abstract 

<N r r 

t-H Sampling permutations from S n is a fundamental problem from probability theory. The 

nearest neighbor transposition chain A4 nn is known to converge in time 0(n 3 logn) in the 
uniform case [18] and time G(n 2 ) in the constant bias case, in which we put adjacent elements 
in order with probability p ^ 1/2 and out of order with probability 1 — p [2j. Here we consider 
the variable bias case where the probability of putting an adjacent pair of elements in order 
depends on the two elements, and we put adjacent elements x < y in order with probability p XiV 
and out of order with probability 1 — p x ,y The problem of bounding the mixing rate of A4 nn 
was posed by Fill 19] and was motivated by the Move- Ahead-One self-organizing list update 
algorithm. It was conjectured that the chain would always be rapidly mixing if 1/2 < p XjV < 1 
for all x < y, but this was only known in the case of constant bias or when p xy is equal to 1/2 
or 1, a case that corresponds to sampling linear extensions of a partial order. We prove the 
chain is rapidly mixing for two classes: "Choose Your Weapon," where we are given rj., . . . , r n _i 
with r*j > 1/2 and p x ,y — r x for all x < y (so the dominant player chooses the game, thus 
fixing his or her probability of winning), and "League Hierarchies," where there are two leagues 
and players from the A-league have a fixed probability of beating players from the B-league, 
t-H players within each league are similarly divided into sub-leagues with a possibly different fixed 

probability, and so forth recursively. Both of these classes include permutations with constant 
bias as a special case. Moreover, we also prove that the most general conjecture is false. We 
do so by constructing a counterexample where 1/2 < p XtV < 1 for all x < y, but for which the 
nearest neighbor transposition chain requires exponential time to converge. 
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1 Introduction 



Sampling from the permutation group S n is one of the most fundamental problems in probability 
theory. A natural Markov chain that has been studied extensively is a symmetric chain that makes 
nearest neighbor transpositions, M. n n- After a series of papers [51 [6] Wilson [18] showed a tight 
bound of 0(n 3 logn) on the mixing time, with upper and lower bounds within a factor of two. 
Subsequently Benjamini et al. [2] considered a biased version of this Markov chain where we select 
a pair of adjacent elements at random and put them in order with probability p > 1/2 and out 
of order with probability 1 — p. They relate this biased shuffling Markov chain to a chain on an 
asymmetric simple exclusion process (ASEP) and showed that they both converge in 0(n 2 ) time. 
These bounds were matched by Greenberg et al. [10] who also generalized the result on ASEPs to 
sampling biased surfaces in two and higher dimensions in optimal Q(n d ) time. 

In this paper we consider a generalization where we are always at least as likely to put a pair 
of adjacent elements in increasing order as out of order, but where the bias can vary depending on 
the values of the two elements. More precisely, we are given input parameters P = {pij} for all 
1 < i , j < ri. The Markov chain M nn iteratively chooses a pair of adjacent elements uniformly, and 
if they are i and j we put i ahead of j with probability pij and we put j ahead of i with probability 
Pji = 1 —pij. We are interested in understanding whether Ai nn is efficient in this generalized 
context. We call the case where 1/2 < pij < 1 for all i < j positively biased. In this case, the fully 
ordered permutation 1, 2, . . . , n is at least as likely in stationarity as every other permutation. It is 
not difficult to see that Ai nn can take exponential time without this condition. 

The problem of bounding the mixing rate of M nn in the variable bias setting was raised by Jim 
Fill [HE] who considered it in the context of the Move- Ahead-One (MAI) self-organizing list update 
algorithm. In the MAI protocol, elements are chosen according to some underlying distribution 
and they move up by one in a linked list after each request is serviced, if possible. Thus, the 
most frequently requested elements will move to the front of the list and will eventually require 
less access time. If we consider a pair of adjacent elements % and j, the probability of performing 
a transposition that moves i ahead of j is proportional to i's request frequency, and similarly the 
probability of moving j ahead of i is proportional to j's frequency, so the transposition rates vary 
depending on i and j and we are always more likely to put things in order (according to their 
request frequencies) than out of order. Fill conjectured that when the transposition probabilities 
P also satisfy a monotonicity condition whereby pij < Pij+i and pij > Pi+ij for all 1 < i < j < n, 
then the chain is always rapidly mixing. In fact, he conjectured that the spectral gap is always 
minimized when p^j = 1/2 for all i,j, a problem he refers to as the "gap problem." He verified 
that the conjecture is true for n = 4 and gave experimental evidence for slightly larger n. 

Although Fill posed the gap problem in a widely circulated manuscript ten years ago, there 
has been very little progress toward solving it. For general n, the chain has only been shown to be 
rapidly mixing in two settings. The first is the constant bias case for which Benjamini et al. [2] 
showed a mixing time of 9(n 2 ) when p^j = p > 1/2 for all i < j. The second case has all of the 
Pij with i < j equal to 1/2 or 1; in this context the nearest neighbor chain M nn samples linear 
extensions of a partial order and was shown by Bubley and Dyer [1] to mix in 0(n 3 log n) time. 

Our results: In this paper we show that the Markov chain M nn is always rapidly mixing 
for two significantly larger classes of inputs which we call "Choose Your Weapon" and "League 
Hierarchies." In the Choose Your Weapon class we are given a set of input parameters ri, . . . , r n _i 
representing each player's ability to win a duel with his or her weapon of choice. When a pair of 
neighboring players are chosen to compete, the dominant player gets to choose the weapon, thus 
determining his or her probability of winning the match. In other words, we set pij = r% when 
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i < j. We show that the nearest neighbor transposition chain A4 nn is rapidly mixing for any choice 
of {ri}. In the League Hierarchy class we are given input parameters q±, . . . ,q n -i along with a 
binary tree T whose internal vertices are labeled with the qi and whose leaves are labeled with the 
elements 1, . . . , n. We think of the leaves descending from the left branch of the root as the A- league 
and the right branch as the B-league, and whenever players from the two leagues are matched up, 
the player from the A-league has an advantage indicated by the probability associated with the 
root. Likewise, within the A-league we have Tier-1 and Tier-2 players, and the probability that 
Tier-1 players win matches against Tier-2 players is determined by the probability at the root of 
that subtree. Thus, pi = qi^j for all i < j, where i A j is the lowest common ancestor of the leaves 
labelled i and j. We show that a related chain including additional transpositions is rapidly mixing 
for any choice of {qi}, and that M nn is also if the {qi} additionally satisfies "weak monotonicity" 
(i.e., pij < Pij+i if j > i). We note that both of these classes are generalizations of the uniform 
bias setting, which can be seen by taking all of the rj or qi to be constant. 

In addition, we disprove the most general form of the conjecture by constructing a set P for 
which the chain requires exponential time, even in the positive bias case where pij > 1/2 for all 
i < j. Our example is motivated by models in statistical physics that exhibit a phase transition 
arising from a "disordered phase" of high entropy and low energy, an "ordered phase" of high energy 
and low entropy, and a bad cut separating them that is both low energy and entropy. This example 
does not satisfy the monotonicity condition of Fill, but does give insight into why bounding the 
mixing rate of the chain in more general settings has proven quite challenging. 

Techniques: For the positive results, our strategy is to use various combinatorial representa- 
tions of permutations and interpret the moves of A4 nn in these new settings. In each case there is a 
natural Markov chain in the new setting including additional moves (also transpositions) that can 
be analyzed using simple arguments. We then reinterpret the new moves in terms of the original 
permutations so that we can deduce bounds on the mixing rate of the nearest neighbor transpo- 
sition chain as well. In each case the new Markov chain consists of a family of transpositions and 
are themselves interesting in the context of generating random permutations. 

For the Choose Your Weapon class, we map permutations to Inversion Tables \12\ I17| that, for 
each element i, record how many elements j > i come before i in the permutation. We consider 
a Markov chain Aii nv that simply increments or decrements a single element of the inversion 
table in each step; using the bijection with permutations this corresponds to adding additional 
transpositions of elements that are not necessarily nearest neighbors to the Markov chain J^A. nn . 
Remarkably, this allows Aiinv to decompose into a product of simple one-dimensional random 
walks and bounding the convergence time is very straightforward. Finally, we use comparison 
techniques [7, 16J to bound the mixing time of the nearest neighbor chain M nn for all choices of 
inputs r%,. . . ,r n _i. This approach also gives new, far simpler proof of fast mixing in the case of 
uniform bias. 

For the League Hierarchy class, we introduce a new combinatorial representation of the permu- 
tation that associates a bit string b v to each node v of a binary tree with n leaves. Specifically, 
b v £ {L, RY V where £ v is the number of leaves in t v , the subtree rooted at v, and for each element 
i of the sub-permutation corresponding to the leaves of t v , b v (i) records whether i lies under the 
left or the right branch of v. The set of these bit strings is in bijection with the permutations. 
We consider a chain Mtree that allows transpositions exactly when they correspond to a nearest 
neighbor transposition in exactly one of the bit strings. Thus, the mixing time of J^itree 

decomposes 

into a product of n — 1 ASEP chains and we can conclude that the chain Aitree is rapidly mixing 
using results in the uniform bias case [2, 10J. Again, we use comparison techniques to conclude 
that the nearest neighbor chain is also rapidly mixing when we have weak monotonicity, although 
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Mtree which simply allows additional transpositions is always rapidly mixing. 

For the negative result showing slow mixing, the choice of P was motivated by a related question 
arising in the context of biased staircase walks [10j . In that context, we are sampling ASEP 
configurations with n zeros and n ones, which map bijectively onto walks on the Cartesian lattice 
from (0, n) to (n, 0) that always go to the right or down. The probability of each walk w is 
proportional to Hx y < w ^xy, where the bias X xy > 1/2 is assigned to the square at (x, y) and xy < w 
whenever the square at (x, y) lies underneath the walk w. We show that there are settings of the 
{X xy } which cause the chain to be slowly mixing from any starting configuration (or walk). In 
particular, we show that at stationarity the most likely configurations will be concentrated near 
the diagonal from (0, n) to (n, 0) (the high entropy, low energy states) or they will extend close 
to the point (n, n) (the high energy, low entropy states) but it will be unlikely to move between 
these sets of states because there is a bottleneck that has both low energy and low entropy. Finally, 
we use the reduction from biased permutations to biased lattice paths to produce a positively 
biased set of probabilities P for which Ai nn also requires exponential time to mix from any starting 
configuration. 

2 Preliminaries 

We begin by formalizing our model. Let f2 = S n be the set of all permutations a = (<r(l), <r(2), . . . , <r(n)) 
of n integers. We consider Markov chains on f2 whose transitions transpose two elements of the 
permutation. A permutation a is represented as a list of elements, u(l),o"(2), . . . ,<r(n). We are 
also given a set P, consisting of Pij € [0, 1] for each 1 < i ^ j < n, where for any i < j, pij > 1/2 
and pj t i = 1 — pij. The Markov chain A4 nn will sample from f2 using P. 

The Nearest Neighbor Markov chain M. nn 

Starting at any permutation ctq, iterate the following: 

• At time t, select an index i£ [n — 1] uniformly at random (u.a.r). 

— Swap the elements o~t(i), o~t(i + 1) with probability p at {i+i)^ t {i) t0 obtain <Jt+i • 

— Do nothing with probability p at (i),a t (i+i) so that o~t+i = o~t- 

The Markov chain M nn connects the state space, since every permutation a can move to the ordered 
permutation (1,2,..., n) (and back) using the bubble sort algorithm. Since M. nn is also aperiodic, 
this implies that Ai nn is ergodic. For an ergodic Markov chain with transition probabilities V, if 
some assignment of probabilities tt satisfies the detailed balance condition 7r(cr)'P(cj, r) = it(t)V(t, a) 
for every o~,t £ then tt is the stationary distribution of the Markov chain |13j . It is easy to 
see that for Ai nn , the distribution vr(cr) = Il^j) Pa(i),a(j)/^i where Z is the normalizing constant 
So-en ri(-t<7) Pa(i),a(j)i satisfies detailed balance, and is thus the stationary distribution. 

The Markov chain Mt can make any transposition at each step, while maintaining the station- 
ary distribution tt. The transition probabilities of A4t can be quite complicated, since swapping 
two distant elements in the permutation consists of many transitions of M nn , each with different 
probabilities. In the following sections, we will introduce two other Markov chains whose transitions 
are a subset of those of Mt for which we can describe the transition probabilities succinctly. 

2.1 Convergence rates of Markov chains 

Next, we present some background on Markov chains. The total variation distance between the 
stationary distribution tt and the distribution of the Markov Chain at time t is \\V t , Tr\\t v = 
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max xe n \ ^2 ye a \P t {x, y) — vr(y)|, where V t (x,y) is the i-step transition probability. The effi- 
ciency of a Markov chain Ai is often measured by its mixing time r(e). For all e > 0, we define 
r(e) = min{t : \\T ,t ,ir\\tv < e, Vi' > t}. We say that a Markov chain is rapidly mixing if there exists 
a polynomial p such that r e = 0(p(n, log(e -1 ))) where n is the size of each configuration in Q. 

In Section [3j we will use a standard technique called coupling. A coupling is a Markov chain 
(Xt, Yt)^. on Q x Q such that each of the processes X± and It is a faithful coupling of Ai, and if 
Xt = Yt, then Xt + \ = Yt+\. Given such a coupling, define the coupling time T as follows: 

T = maxi?[min{i : = lf|Xo = x, Yq = y}\. 

x,V 

Then the following theorem (see, e.g. [I]) relates the coupling time and the mixing time. 

Theorem 2.1. r(e) < Teflne^ 1 ]. 

In each of Sections [3] and [4j we introduce new Markov chains to sample from the same dis- 
tribution as Ai nn . In order to obtain bounds on the mixing time of Ai nn , we will compare Ai nn 
with these auxiliary chains in Section [5| If P and P' are the transition matrices of two reversible 
Markov chains on the same state space fi with the same stationary distribution tt, the compar- 
ison method (see [7] and |16j ) allows us to relate the mixing times of these two chains. Let 
E(P) = {(ct,/3) : P(o-,/3) > 0} and E(P') = {(a, /3) : P'(cr,/3) > 0} denote the sets of edges of the 
two graphs, viewed as directed graphs. For each a,/3 with P'(a,(3) > 0, define a path using a 
sequence of states a = ao, ui, • • • , U}~ = f3 with P(<Ji, > 0, and let denote the length of 
the path. Let T(v,uj) = {(a, (3) G E(P') : (v,oj) G To-/?} be the set of paths that use the transition 
(v,uj) of P. Finally, let 7r* = minp g Q7r(p) and define 

A= max , * r V It^Itt^P^o-,^). 

r(v,u)) 



The following formulation of the comparison method is due to Randall and Tetali [TC] • 

log(l/(e7r, 
log(l/2e) 



Theorem 2.2. With the above notation, for < e < 1, we have r(e) < 4 yWfe ^ ^4r ; (e) 



3 Choose Your Weapon 

In the Choose Your Weapon class, we are given 1/2 < 7*i,r2, . . . , r n _i < 1 and a set P satisfying 
Pi,j = r ii if * < J and p^j = 1 — if j < i. We show that a new Markov chain M.i nv is rapidly 
mixing under these conditions, which will imply that Ai nn and A4t are as well, as we show in 
Section [5] The Markov chain Aii nv acts on the inversion table of the permutation |12^ I17j. which 
has an entry for each i G [n] counting the number of inversions involving i; that is, the number 
of values j > i where j comes before i in the permutation (see Figure [T]) . It is easy to see that 
the ith element of the inversion table is an integer between and n — i. In fact, the function / 
is a bijection between the set of permutations and the set I of all possible inversion tables (all 
sequences X = (x%,X2, • • • , x n ) where < xi < n — i for all i G [n]). To see this, we will construct 
a permutation from any inversion table X £ I. Place the element 1 in the (x\ + l)st position of 
the permutation. Next, there are n — 1 slots remaining. Among these, place the element 2 in the 
(x2 + l)st position remaining (ignoring the slot already filled by 1). Continuing, after placing i — 1 
elements into the permutation, there are n — i + 1 slots remaining, and we place the element i into 
the (xi + l)st position among the remaining slots. This proves that I is a bijection from S n to I. 
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a = 81537462 
1(a) = 17231210 



Figure 1: The inversion table for a permutation. 



Given this bijection, a natural algorithm for sampling permutations is to perform the following 
local Markov chain on inversion tables: select a position i 6 [n] and attempt to either add one or 
subtract one from Xi, according to the appropriate probabilities. In terms of permutations, this 
amounts to adding or removing an inversion involving i without affecting the number of inversions 
involving any other integer, and is achieved by swapping the element i with an element j > i 
such that every element in between is smaller than both i and j. If i moves ahead of j, this 
move happens with probability pij because for each k that i and j are swapped past, k < 
so Pki = r k = Pkj (since each of these depend only on k) so the net effect on the distribution is 
neutral, and the detailed balance condition ensures that tt is the correct stationary distribution. 
Formally, the Markov chain is defined as follows. 

The Inversion Markov chain M.i nv 

Starting at any permutation ao, iterate the following: 

• Select an element i£ [n] with probability (n — and a bit b £ { — 1, +1} . 

— If b = +1, let j be the first element after element i in at such that j > 
i. With prob. Pj,i/2 = ( 1 — ) / 2 , obtain at+i from at by swapping i and j. 

— If b = —1, let j be the last element before element i in at such that j > 
i. With prob. Pij/2 = rj/2, obtain at+i from at by swapping i and j. 

• With prob. 1/2, at+i =at- 

This Markov chain contains the moves of A4 nn (and therefore also connects the state space). 
Although elements can jump across several elements, it is still fairly local compared with the 
general transposition chain Mt which has (2) choices at every step, since Jviinv has at most 2tz. 

The Markov chain Mi nv is essentially a product of n independent one-dimensional processes. 
The ith process is just a random walk bounded between and n—i, which moves up with probability 
1 — Tj and down with probability rjj hence its mixing time is 0(n 2 ), unless rj is bounded away 
from 1/2, in which case its mixing time is 0(n). However, each process is slowed down by a factor 
of n since we only update one process at each step. To make this argument formal, we will use 



Theorem 7.1 which bounds the mixing time of a product of independent Markov chains and whose 
elementary proof is deferred to Section [7} 

Theorem 3.1. Let 1/2 < r±, . . . , r n -\ < 1 be constants, and let r max = maxjrj. Assume that 

Pi,j ^min{ij} • 

1. If each ri > 1/2 then the mixing time of Aii nv on biased permutations with these pij values 
is 0(n 2 ln(n/e)). 

2. Otherwise, the mixing time of Mi nv is 0(n 3 ln(n/e)). 

To prove this theorem, we need to analyze the one-dimensional process Ai(r, k), bounded be- 
tween and k, which chooses to move up with probability r > 1/2 and down with probability 
1 — r at each step, if possible. This simple random walk is well-studied; we include the proof for 
completeness. 
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Lemma 3.2. Let 1/2 < r < 1 be constant. Then the Markov chain Ai(r,k) has mixing time 

1. r(e) = Oik\ne~ x ) if r is a constant bigger than 1/2, and 

2. r(e) = 0(k 2 Ine- 1 ) if r = 1/2. 

Proof. We use a variation on coupling. We use the trivial coupling, which chooses to move the same 
direction in each Markov chain. Notice that the Markov chain Ai (r, k) is monotone with respect 
to this coupling, in the sense that if Xt is below Yt, then it will remain so until Xf = Yy ■ Thus 
the time until the chains couple is bounded by the time it takes for a process Zt, where Zq = 0, 
to reach height k. However, Zt is just a biased random walk bounded between and k. First, we 
notice that Zt is non-decreasing in expectation; that is, for all t > 0, E[Zt+i — Zt] > 0: 

E[Z t+1 -Z t ] = r-(l-r) = 2r-l>0. 

Consider the case that r > 1/2. Define W{t) = k — Z(t) + (2r — l)t. Examining the expected 
difference between W{t) and W(t + 1), we see 

E[W(t + 1) - W(t)} = E[-Z{t + 1) + 2r - 1 + Z(t)} = 0. 

Also, since the differences W(t + 1) — W(t) are bounded, {^(i)} is a martingale. The time 
T = min{t : Zt = 0} is a stopping time for the process W(t), so we may apply the Optional 
Stopping Theorem for martingales to deduce that 

E[W(T)] = W(0) = k. 

However, since 

E[W(T)} = E[k - Z{T) + (2r - 1)T] = (2r - l)E[T], 

it follows that E[T] = k/(2r — 1). Recall from Theorem |2.1| that 

r( e ) = OiTlne- 1 ) = 0(k/(2r - l^ne" 1 ) = O^lne" 1 ). 

Suppose now that r = 1/2. This case is similar, and follows from Lemma 6 of |14j . Notice 
E[(Z(t+l)-Z(t)) 2 } = r+(l-r) = 1. Therefore E[T] < k(2k-k)j\ = k 2 . Hence r(e) = O^ine- 1 ). 

□ 



Finally, we can use these bounds to prove Theorem 3.1 



Proof of Theorem 3.1. The ith. process is chosen with probability (n — i)/(2(^j). Therefore, by 



Theorem 7.1, the mixing time of Aiinv satisfies 



Tie) < -^-in-i)\ni2n/e) = ( U ) 1.(2/?/. ) - ()( ir \n(n / < )) 
n — i V 2 / 



when each rj is bounded away from 1/2. Otherwise, 



( n ) 

Tie) < -^^(n - i) 2 ln(2n/e) = Oin 3 ln(n/e)). 
n — i 



□ 



Remark 3.3. The same proof also applies to the case where the probability of swapping i and j 
depends on the object with lower rank (i.e., we are given r^, . . . r n and we let Pij — rj for all i <C j ). 
This case is related to a variant of the MAI list update algorithm, where if a record is requested, 
we try to move the associated record x ahead of its immediate predecessor in the list, if it exists. If 
it has higher rank than its predecessor, then it always succeeds, while if its rank is lower we move 
it ahead with probability f x = r x j (1 + r x ) < 1. 
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4 League Hierarchy 



In this section, we turn to a second class of P that have what we call league structure. Let T be a 
proper rooted binary tree with n leaf nodes, labeled 1, . . . , n in sorted order. Each non-leaf node v 
of this tree is labeled with a value \ < q v < 1. For i, j 6 [n], let i V j be the lowest common ancestor 
of the leaves labeled i and j. We say that P has league structure T if for all i < j, p%j = qiy T j and 
Pj t i = 1 — Pij. For example, Figure 2a shows a set P such that p\4 = .8, p^g = .9, and p§% = .7. 




2 ) ( 3 ) ( 7 ) ( 8 ) ( 2 ) ( 3 / 

(a) (b) 

Figure 2: A set P with league structure, and the corresponding tree-encoding of the permutation 519386742. 

When T is a complete binary tree and q Vl = q V2 for each v\ and v 2 on the same level of the tree, 
this is precisely the representation of the winning probabilities for a tournament described in the 
introduction. We define the Markov chain Mt ree (T) over permutations, given a set P with league 
structure T. 

The Markov chain M tre e(T) 

Starting at any permutation oo, iterate the following: 

• Select distinct a, b 6 [n] u.a.r. Assume a<b. 

• If every number between a and b in the permutation at is not a descendant in T 
of aVyfr, obtain (Jt+i from at by placing a, b in order with probability p a ^> and 
out of order with probability 1— p a ,b> leaving all elements between them fixed. 

• Otherwise, at+i=at- 

First, we show that this Markov chain samples from the same distribution as -M nri ,. Swapping 
arbitrary non-adjacent elements a and b could potentially change the weight of the permutation 
dramatically. However, for any element c that is not a descendant in T of o Vt b, the relationship 
between a and c is the same as the relationship between b and c. Thus the league structure ensures 
that swapping a and b only changes the weight by a multiplicative factor of X a fi = p a ,b/Pb,a- 

Lemma 4.1. The Markov chain Mtree{T) has the same stationary distribution as A4 nn . 

Proof. Let tt be the stationary distribution of A4 nn , and let (<7i,<T2) be a transition in A^t ree (T). 
It suffices to show that the detailed balance condition holds for this transition with the stationary 
distribution tt. Recall that we may express ir(a) = Y\ i ju <a jPi,j / 'Z where Z = YlaeCiW.i j\i< a jPi,r 
The transition (cji, 02) transposes some two elements a < CTl b, where every element between a and 
b in <7j is not a descendant of a V b in T. Let x±, . . . , xt be those elements. Thus, the path from a 
or b to Xi in T must pass through a V b and go to another part of the tree. For every such element 
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(a V b) V Xi = b V X{. From the observation, we see from the league structure that 
Paxi = Pbxi for every X{ between a and b. In particular, x% is either greater than both a and b or 
less than both a and b, since all integers c such that a < c < b are necessarily descendants of a V b. 
Therefore, 

7[((Tl) _ Pab Hi Pax, _ Pab 

This is exactly the ratio of the transition probabilities in Mtree(T), thus Mtree{T) also has sta- 
tionary distribution ir. □ 

The key to the proof that Aitree(T) is rapidly mixing is again to decompose the chain into n — 1 
independent Markov chains, Mi,M.2, ■ ■ • > -M n -i, one for each non-leaf node of the tree T. To this 
end, we introduce an alternate representation of a permutation as a set of binary strings arranged 
like the tree T. For each non-leaf node v in the tree T, let L(v) be its left descendants, and R(v) 
be its right descendants. We now do the following: Given the permutation a, list each descendant 



x of v in the order we encounter it in a; these are parenthesized in Figure 2b Then for each listed 
element x, write a 1 if x £ L(v) and a if x 6 R(v). This is the final binary encoding in Figure [2fe[ 
We see that any a will lead to an assignment of binary strings at each non-leaf node v with L(v) 
ones and R(v) zeroes. Next we verify that this is a bijection between the set of permutations and 
the set of assignments of such binary strings to the tree T. Given any such assignment of binary 
strings, we can recursively reconstruct the permutation a as follows. For each leaf node i, let its 
string be the string "i" . For any node n with binary string 6, determine the strings of its two 
children. Call these s±, sq. Interleave the elements of s± with so, choosing an element of s\ for each 
1 in 6, and an element of sq for each 0. This yields a permutation a. 

With this bijection, we first analyze A / it ree (T)'s behavior over tree representations and later 
extend this analysis to permutations. The Markov chain A^j ree (T), when proposing a swap of the 
elements a and b, will only attempt to swap them if a, b correspond to some adjacent and 1 in the 
string associated with a V6. Swapping a and b does not affect any other string, so each non-leaf node 
v represents an independent exclusion process with L(v) ones and R{v) zeroes. These exclusion 
processes have been well-studied [H [181 121 HH]- We will use the following bounds on the mixing 
times of the symmetric and asymmetric simple exclusion processes. 

Theorem 4.2. Let M be the exclusion process with parameter p on k\ ones and k<i zeroes, where 
k = k\ + &2- Then 

1. if p = 1/2, r(e) = 0(fc 3 logfaVe)). W 

2. if p> 1/2, then r(e) = 0(k(mm{k l7 k 2 } + logk)log(e- 1 )) = Oi^logie- 1 )). fW^ 



The bounds in Theorem |4.2| refer to the exclusion process which selects a position at random and 
swaps the two elements in that position with the appropriate probability. However, our process 
selects arbitrary pairs (i,j) consisting of a single one and a single zero. Since we only swap if 
they are neighboring, this may slow down the chain by a factor of at most k. 

Since each exclusion process operates independently, the overall mixing time will be roughly 
n times the mixing time of each piece, slowed down by the inverse probability of selecting that 



process. Next, we will use Theorems 7.1 and 4.2 to prove that Mtree{T) is rapidly mixing 



Theorem 4.3. If P has league structure T, then the mixing time of Mtree{T) under P satisfies 

Ttree(e) = 0(n 5 log(n/e)). 
//P is such that each q, L > 1/2 is a constant, then r tree (e) = 0(n 3 lognlog(n/e)). 
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Proof. In order to apply Theorem |7.1| to the Markov chain Mt ree (T), we note that for a node with 
ki ones and k 2 zeroes (k = ki + k 2 ), the probability of selecting that node is fcife/Q)- Since 
M = n — 1, Theorem |7.1| implies 

r(e) < n( \~ 1) fc 4 ln(2nfc 1 fc 2 /e) = 0(n 5 log(n/e)). 

Of course, if all of the chains have probabilities that are bounded away from 1/2, then we can use 
the second bound from Theorem 14,21 to obtain 



< U ^ l \ 2 (m.m{ki,k 2 } + \ogk)\og{2n/e) 
k\k 2 

n(n — l)k 2 ( \ogk \ „ 

There are two cases to consider. Let < c < 1. If min{/ci, k 2 } > clog k then 
r(e) < ra( "~ 2 1)fc2 (l + c)log(2n/6)) = 0(n 3 log(n/e)). 
Otherwise, max{A;i, k 2 } > k — clog k, so since k < n, 

< n(n ^lg (1 + ]ogfe)log(2w/e)) = n(n ~ 1 g ) f : (l + logA;)log(2n/6)) = 0(n 3 lognlog(n/e)). 



k - clogk v y y ' " l _ cl °e fc 



□ 



5 Bounding the mixing time of A4 nn for both classes 

Our goal now is to use the comparison method to obtain bounds on the mixing time of M nn in 
the settings of Sections [3] and [4] from the bounds on the mixing times of M.i nv and M.tree{T). 
When comparing the mixing times of M. tree (T) and Ai nn , for example, the goal is to show that a 
move e = (a, (3) of Mtree{T), which is allowed to transpose i and j that are not necessarily nearest 
neighbors, can be simulated with a sequence of moves of A4 nn . Moreover, we must ensure that our 
path does not go through transitions that are much smaller in weight than min{7r(<r), 7r(/3)}. This 
type of argument is straightforward for the moves of A4i nv , and gives some intuition for the more 



involved argument to compare Aitree(T) with Ai nn , which will follow in Section 5.2 

In the next two sections, we assume that each pij is a constant less than 1; this is to ensure a 
good comparison between the spectral gap and the mixing time. If this condition is not satisfied, 
then the proofs still go through and will give a bound on the spectral gap, but will not provide a 
good bound on the mixing time. 

5.1 Comparing M.i nv with Ai nn 

First, we consider the setting of Section [3j where py depends on min{i, j}. 

Theorem 5.1. Let 1/2 < r\,r 2 , . . . ,r n _i < 1 be constants. Assume P is defined bypij = ri for 
i < j. Then the mixing time of Ai nn on biased permutations under P is 0(n 8 log(n/e)). 



Here we are using the bound from Theorem 3.1 part 2, and if each pjj is bounded away from 1/2 



then we would get a better bound of 0(n 7 log(n/e)) using Theorem 3.1 part 1. Recall that for any 
a, b £ [n], we defined A ajb = p a ,b/Pb,a- 
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Proof. In order to apply Theorem 2.2 we need to define, for any transition e = (a, 0) of the Markov 
chain Mi nv , a sequence of transitions of M. nn - Let e be a transition of M.% n v which performs a 
transposition on elements a(%) and ct(j'), where i < j. Recall Ai{ nv can only swap a(i) and a(j) if 
all the elements between them are smaller than both a(i) and cr(j). To obtain a sufficient bound 
on the congestion along each edge, we ensure that in each step of the path, we do not decrease the 
weight of the configuration. This is easy to do; in the first stage, move a{i) to the right, one step 
at a time, until it swaps with u(j). This removes an inversion of the type (a(i),a(k)) for every 
i < k < j, so clearly we have not decreased the weight of the configuration at any step. Next, move 
cr(J) to the left, one step at a time, until it reaches position i. This completes the move e, and 
at each step, we are adding back an inversion of the type (a(j),a(k)) for some i < k < j. Since 
a(k) = min{a(j),a(k)} = min{ a(i), cr(fe)}, we have p a {k),a(i) = Pa{k),<r{j) for every i < k < j, so in 
this stage we restore all the inversions destroyed in the first stage, for a net change of \ a (j),a(i)- 

Given a transition (v, to) of M. n n we must upper bound the number of canonical paths 70-/3 that 
use this edge, which we do by bounding the amount of information needed in addition to (v, u) to 
determine a and f3 uniquely. For moves in the first stage, all we need to remember is <r(j), because 
we know a(i) (it is the element moving forward). We also need to remember i (that is, the original 
location of o~(i)). Given this information along with v and u we can uniquely recover (a, (3). Thus 
there are at most n 2 paths which use any edge (v,u). Also, notice that the maximum length of 
any path is 2n. 

Next we bound the quantity A which is needed to apply Theorem |2.2[ Recall that we have 
guaranteed that 7r(o") < max{7r(i;), 7r(w)}. Assume first that 7t(cj) < ir(v). Then 



A = max 



1 



ir(v)P(v, oj) 



P* l/(2n) 
< max > 2n ; ' H ! < max > 2n — '\ ' 



0(n 3 



If, on the other hand, vr(o-) < n(ui), then we use detailed balance to obtain: 

1 



A = max < 

(«,u)eB(p) 



max < 

( v ,uj)eE(P) 



tt(v)P(v, uj) 



1 



J2 \7a(}\v(<r) p, (<r>P) 



r(v,oj) 



< max > In 

r(v,u>) 



ir(u)P(u,v) r{vtw) 
P'(a,P) 



KpW<r)P'(.<r,P) 



< max > In 

(v,w)eE(P) ^ 



1/(2") 



0(n 3 ). 



T(v,uj) (1+A)(n-1) 



In either case, we have A = 0(n 3 ). Let A = minj<jA,-i. Then -zr* = mm p< =Qir(p) > A^) /n\, so 
log(l/(e7r*)) = 0(n 2 loge _1 ), since each pij bounded away from 1 implies A is a positive constant. 
Applying Theorem 2.2 proves thcit T nn (e) = 0(n 8 log(n/e)). □ 
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5.2 Comparing .M tree (T) with M nn 

In this section we show that Ai nn is rapidly mixing when P has league structure and is weakly 
monotone: 

Definition 5.1. The set P is weakly monotone if properties 1 and either 2 or 3 are satisfied. 
1- Pi,j > 1/2 for all 1 < i < j < n, and 



2. Pij+i > Pij for alll<i<j<n — 1 or 



3. Pi-ij > Pij for all 2 < i < j < n. 
We note that if P satisfies all three properties then it is monotone, as defined by Fill [9]. 



The comparison proof in this setting is similar to that of Section 5.1 except that there may be 
elements between a{i) and a(j) that are larger than both and elements that are smaller than both. 
This poses a problem, because we may not be able to move a{i) past all the elements between them 
without greatly decreasing the weight. However, when P is weakly monotone, we can introduce a 
trick to get around this problem. At a high level, we shift the elements between a(i) and o~(j) that 
are smaller than a(i) and a(j) to the left in a special way, increasing the weight of the configuration 
in such a way that when we move a(i) to the right, the weight never goes below min{-7r(cr), 7r(/3)}. 
Specifically, we prove the following theorem. 

Theorem 5.2. IfP has league structure, is weakly monotone and is such that pij is a constant 
less than 1 for all i,j, then the mixing time of Ai nn satisfies r nra (e) = 0(n 9 log(n/e)). 



Again, we are assuming the worst case bound on the mixing time of Aitree{T) given in Theorem 4.3 
and if each pij is bounded away from 1/2 then we would get a better bound. 

Proof. Throughout this proof we assume that P satisfies properties 1 and 2 of the weakly monotone 
definition. If instead P satisfies property 3, then the proof is very similar. In order to apply 



Theorem 2.2 to relate the mixing time of A4 nn to the mixing time of Aitree(T) we need to define for 
each transition of Mt ree (T) a canonical path using transitions of A4 nn . Let e = (a, f3) be a transition 
of M tree(T) which performs a transposition of elements o~(i) and <t(j) where i < j. If there are no 
elements between o~{i) and a(j) then e is already a transition of Ai nn and we are done. Otherwise, a 
contains the string cr(i), a{i + 1), ...<t(j — 1), cr(j) and j3 contains cr(j), a(i + 1), ■■■cr(j — 1), cr(i). From 
the definition of Aitree(T) we know that for each a(k), k £ [i + 1, j — 1], either a{k) > a(i),a(j) 
or a{k) < a(i),a(j). Define S = {cr(k) : < a(i),a(j)} and B = {a{k) : > a(i),a(j)}. To 
obtain a good bound on the congestion along each edge we must ensure that the weight of the 
configurations on the path are not smaller than the weight of a. To this end, we define three stages 
in our path from a to /3. In the first, we shift the elements of S to the left, removing an inversion 
with each element of B. In the second stage we move o~(i) next to a(j) and in the third stage we 
move o~(j) to cr(i)'s original location. Finally, we shift the elements of S to the right to return them 
to their original locations. See Figure [3j 

Stage 1: In this stage, for each b 6 B, we remove an inversion involving b by shifting an element 
of S to the left past b. More precisely, if o~(j — 1) G B, shift to the left until an element from 
S is immediately to the left of o~(j). Next, starting at the right-most element in S and moving left, 
for each o~(k) £ S such that a(k — 1) E B, move o~(k) to the left one swap at a time until cr(k) has 



an element from S or a(i) on its immediate left (see Figure 4a). Notice that for each element b 6 B 



we have removed exactly one (6, o~(k)) inversion where o~{k) E S U o~(j). 
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Figure 3: The canonical path for transposing 5 and 7. Notice that the elements in S are underlined. 



Stage 2: Next perform a series of nearest neighbor swaps to move a(i) to the right until it is in 



position j (the original position occupied by a(j) in a, see Figure 4b). While we have created a 
(b,a(i)) inversion for each element b G B, we claim that the weight has not decreased from the 
original weight by more than a factor of ^a(j).a(i)- This is because in Stage 1, for each element 
b G B, we removed a (6, s) inversion for some s £ SU Assume first that s G 5 1 . Then since 

b > <r(i) > s, it follows that p\,,a(i) — Pfe,s f° r all s G S since the P are weakly monotone; thus, for 
each 6 we introduce a multiplicative factor of Xb,a(i)l^b,s > 1- On the other hand, if s = cr(j) then 
recall Pb,o(j) = Pb,a(i) because b is not a descendant of a(i) V cr(j) in the tree T. Hence the current 
configuration has weight at least A (T (jw,A7r(o"). Since \ a u\ a i{\ is also the ratio of vr(cr) and 7r(/3), 
it follows that the weight at every step of Stage 2 does not go below min{7r(cj), tt(/3)}. For each 
cr(k) € S we have also removed a (cr(fe), <t(j)) inversion, which can only increase the weight of the 
configuration. 
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(a) (b) 
Figure 4: Stages 1 and 2 of the canonical path for transposing 5 and 7. 

Stage 3: Perform a series of nearest neighbor swaps to move a(j) to the left until it is in the same 
position a(i) was originally. While we created an (cr(k),a(j)) inversion for each o(k) G 5, these 
inversions have the same weight as the (a(i),a(k)) inversion we removed in Stage 2. In addition 
we have removed an (a(l),a(j)) inversion for each a (I) G B. 

Stage 4: Finally we want to return the elements in S and B to their original position. Starting 
with the left-most element in S that was moved in Stage 1, perform the nearest neighbor swaps to 
the right necessary to return it to its original position. 

It's clear from the definition of the stages that along any path the weight of a configuration 
never decreases below the weight of min(7r(er), 7r(/3)). Given a transition (v,oo) of M nn we must 
upper bound the number of canonical paths 7 CTi g that use this edge. Thus, we analyze the amount 
of information needed in addition to (z, w) to determine a and j3 uniquely. First we record whether 
(a, (5) is already a nearest neighbor transition or which stage we are in. Next for any of the 4 stages 
we record the original location of a{i) and cr(j). Given this information, along with v and u), we 
can uniquely recover (cr, /3). Hence, there are at most 4n 2 paths through any edge (v,co). Also, 
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note that the maximum length of any path is An. 

Next we bound the quantity A which is needed to apply Theorem |2,2[ Recall that for each 
transition (v, u) of the path J a ,0, we have guaranteed that ir(v) > min{7r(cr), tt(/3)}. Assume first 
that n(v ) > 7t((t). Then 



A = max < 

(v,uj)£E(P) 



tt(v)P(v, lo) 



WM°) p '^P) 



T(v,u>) 



< max > 2n ) ' ' < max > 2n '-^ — 

If, on the other hand, tt(v) > n(/3), then we use detailed balance to obtain: 



0(n 2 ). 



A = max < 

(v,u)£E(P) 



1 



7r(v)P(v, uj) 



£ \ 1(T M<r)P'{<r,P) 



r(v,uj) 



1 



max 



(v,u)eE(P) I tt(v)P(v,oj) 



^ | 7CT/ 3|vr(/3)P'(/3,a) 



r(«,w) 



< max > 2n 

T(v,u) 



< max \^ 2 

(u,w)£B(P) 



n- 



0(n 2 



F(v,lo) (1+A)(n-1) 



In either case, we have A = 0(n 2 ). Let A = minj<jA, j. Then -zr* = minpgn tt(p) > \^/n\, so 
log(l/(e7r*)) = 0(n 2 log e _1 ), as above. Applying Theorem 2.2 proves that T nn (e) = 0(n 9 log(n/e)). 

□ 



Remark 5.3. -By repeating Stage 1 of the path a constant number of times, it is possible to relax 
the weakly monotone condition slightly if we are satisfied with a polynomial bound on the mixing 
time. 



6 Slow Mixing of A4 nn 

We conclude by showing that while A4 nn is rapidly mixing for two large, interesting classes of inputs, 
this is not true in general. In particular, we show that there are positively biased permutations 
for which the chain Ai nn requires exponential time to converge to equilibrium. This disproves the 
conjecture that the chain will always be fast when P satisfies py > 1/2 for all i < j. 

Our example comes from sampling staircase walks with fluctuating bias, which were examined 
in |10| and [15]. Staircase walks are sequences of n ones and n zeros, which correspond to paths 
from (0, n) to (n, 0), where each 1 represents a step to the right and each represents a step down 
(see Figure]^)). For ease of notation in the following proof, we replace the zeroes by negative ones. 
In [15], Randall and Streib examined the Markov chain which attempts to swap a neighboring 
(1, —1) pair, which essentially adds or removes a unit square from the region below the walk, with 
probability depending on the position of that unit square. We will show that for our choice of 
P, permutations are equivalent to staircase walks, and hence the proof that the Markov chain on 
staircase walks is slow applies in our setting as well. 
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Suppose, for ease of notation, that we are sampling permutations with 2n entries (having 
an odd number of elements will not cause qualitatively different behavior). Let M = n — \fn, 
e = l/(16n + 2), and gg < 5 < ^ be a constant to be defined later. For i < j < n or n < i < j, 
Pij = 1, ensuring that once the elements 1,2, ... ,n get in order, they stay in order (and similarly 
for the elements n + l,n + 2, . . . ,2n). The pij values for i < n < j are defined as follows (see 



Since the smallest (largest) n elements of the biased permutation never change order once they 
get put into increasing order, permutations with these elements out of order have zero stationary 
probability. Hence we can represent the smallest n numbers as ones and the largest n numbers 
as negative ones, assuming that within each class the elements are in increasing order. Given a 
permutation a, let f(o~) be the sequence of l's and -l's such that f(cr)i = 1 if i < n and —1 otherwise. 
Then if a is such that elements 1,2, ... ,n and elements n + 1, n + 2, . . . , 2n are each in order, f(o~) 
maps a uniquely to a staircase walk. For example, the permutation a = (5,1,7,8,4,3,6,2) maps 
to f(cr) = (—1, 1, —1, —1, 1, 1, —1, 1). The probability that an adjacent 1 and -1 swap in M nn then 
depends on how many l's and -l's occur before that point in the permutation. Specifically, if 
element i is — 1 and element i + 1 is 1 then we swap them with probability ^ -\- e if the number of 
l's occurring before position i plus the number of -l's occurring after i + 1 is less than n + M — 1. 
Otherwise, they swap with probability 1 — 5. Equivalently, the probability of adding a unit square 
at position v = (x, y), which is called the bias at v = (x, y), is | + e if x + y < n + M, and 1 — 5 
otherwise; see Figure [5^,. We will show that in this case, the Markov chain is slow. The idea is that 
in the stationary distribution, there is a good chance that the positive and negative ones will be 
well-mixed, since this is a high entropy situation. However, the identity permutation also has high 
weight, and the parameters are chosen so that the entropy of the well-mixed permutations balances 
with the energy of the maximum (identity) permutation, and that to get between them is not very 
likely (low entropy and low energy). 

We identify sets Si, 52, S3 such that ^(S^) is exponentially smaller than both n(Si) and 7r(Ss), 
but to get between £1 and S3, M. n n and M.t must pass through £2, the cut. Then we use the 
conductance to prove M nn and Mt are slowly mixing. For an ergodic Markov chain with stationary 
distribution it, the conductance is 



and we will show that the bad cut (Si, S2, S3) defined in Section [6] implies that $ is exponentially 
small. The following theorem relates the conductance and mixing time (see |llj). 

Theorem 6.1. For any Markov chain with conductance <3? ; r > (4 < I>)~ 1 — 1/2. 

We are now ready to prove the main theorem from this section. 

Theorem 6.2. There exists a set P for which M nn has mixing time r(e) > e n / 24 /4 - 1/2. 

Proof. For a staircase walk a consisting of a sequence of steps dj G {±1}, define the height of ai 
as a ji an d let max(cr) be the maximum height of o~i over all 1 < i < 2n. Let Si be the set of 
configurations a such that max(cr) < n + M, S2 the set of configurations such that max(<r) = n + M, 
and S3 the set of configurations such that max(<r) > n + M. That is, Si is the set of configurations 





1-5 if % - j + 2n + 1 > n + M; 
\ + e otherwise. 



$ = min 

sen 



n(si)P(si,s 2 )/7T(S) 



n(S)<l/2 Sl( zS,S 2 £S 
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Figure 5: (a) Fluctuating bias with exponential mixing time, (b) Staircase walks in Si,S 2 , and S3 



that never reach the dark blue diagonal in Figure [5)3, S2 is the set whose maximum peak is on 
the dark blue line, and S3 is the set which crosses that line and contains squares in the light blue 
triangle. Clearly to move from Si to S3, the Markov chain must go through S 2 . 

Define 7 = (l/2 + e)/(l/2 — e), which is the ratio of two configurations that differ by swapping a 
(1, —1) pair with probability \+e. By the definition of e, we have 7 = Let £ = (1— 5)/5, which 

is the ratio of two configurations that differ by swapping a (1, —1) pair with probability 1—5. Finally, 
let b(a) be the number of tiles below the diagonal M in a and a(a) be the number of tiles above the 
diagonal M in a. Then by detailed balance, vr(Si) = Z~ x £ ff6Si tt(S 2 ) = Z' 1 J2aeS 2 ^ (<T) > 
and vr(S 3 ) = Z' 1 Y,aeS :i 7 fe{,j) £ a(<T) , where Z is a normalizing constant. We will show that there 
exists a constant ^ < 5 < | such that 7r(S 2 ) is exponentially smaller than both tt(Si) and ^(5*3), 
which will have equal weight. 

First we show that 7r(S 2 ) is exponentially smaller than vr(S'i) for all values of 5. Since there are 
at most n 2 — (n — M) 2 /2 = n 2 — n/2 tiles with weight 7 in any a £ S 2 , we have 

vr(S 2 ) = Z- 1 ^ l a{a) < Z~V 2 ~ n/2 |S 2 | < Z- l e n ' A - l ^\S 2 \, 

since 7™ 2 -™/ 2 = (1 + ±) n2 ~ n/2 < e™/ 4 " 1 ^. 

Next we will bound |S 2 U S3I, which in turn provides an upper bound on |/S^|. The unbiased 
Markov chain is equivalent to a simple random walk Wm = X\ + X2 + • • • + X 2n = 0, where 
Xi G {+1,-1} and where a +1 represents a step to the right and a —1 represents a step down. We 
call this random walk tethered since it is required to end at after 2n steps. Compare walk W2 n 
with the untethered simple random walk W 2n = X[ + X' 2 + . . . + X' 2n . 

P ( max W t > M) = P[ max W! > M I WL = | 

\l<t<2n ~ J \l<t<2n 1 ~ 1 M J 

P (maxi< f < 2n Wj > M) 
P{W 2n = o) 

= -2-P max W/ > M 

( 2 n n ) Vi<*< 2 ™ y 

w v/?™ P { max W/ > M ] . 

\ v l<t<2n / 

Since the are independent, we can use Chernoff bounds to see that 

P ^max W' t >M^j< 2nP{W 2n > M) < 2ne = ^r . 
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Notice that M 2 /(2 n) = (n — \/n) 2 / (2n) = (\/n — l) 2 /2 > re/3 for n > 4. Together these show that 
P (maxi< t <2n Wt > M) < -y/vrn 3 /^"™/ 3 . In particular, 



Therefore we have 



tt(5i) " 



< 



< 



< 



1 „n/4-l 




for large enough n. Therefore, vr(S , 2) is exponentially smaller than ir(S\) for every value of 5. 

Our goal is to show that there exists a value of 5 for which ^(S^) = tt(Si), which will imply that 
7r(52) is also exponentially smaller than ^(6*3), and hence the set S2 forms a bad cut, regardless 
of which state the Markov chain begins in. To find this value of 5, we will rely on the continuity 
of the function /(£) = Ztt(Ss) — Zir{S\) with respect to £ = (1 — S)/5. Notice that Zir(Si) is 
constant with respect to £ and Zir(S 3 ) = ^2 aeSa 7 fe ( CT )£ ct (' J ) is just a polynomial in £. Therefore 
Zn(S 3 ) is continuous in £ and hence /(£) is also continuous with respect to £. Moreover, when 
£ = 7, clearly Zir(S3) < Zn(Si), so 7(7) < 0. We will show that /(4e 2 ) > 0, and so by continuity 
we will conclude that there exists a value of £ satisfying 7 < £ < 4e 2 for which /(£) = and 
Z-k(Sz) = Zir(Si). Clearly this implies that for this choice of £, ^(S^) = 7r(iSi), as desired. To 
obtain the corresponding value of S, we notice that 5 = l/(£ + 1). In particular, 5 is a constant 
satisfying ^ < 5 < |. 

Thus it remains to show that /(4e 2 ) > 0. First we notice that since the maximal tiling is in S3, 

2 (n-Af) 2 (n-M) 2 

tt(S 3 ) > Z~V J Also, 
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Therefore 



iin\ 

7r(5i)A(Ss) < - r^g < (2e)"r n/2 = 1 
£ 2 

since £ = 4e 2 . Hence /(4e 2 ) = Ztt{S 3 ) - Ztt(Si) > Zn(S 3 ) - Ztt(S 3 ) = 0, as desired. 
Thus, the conductance satisfies 



Hence, by Theorem 6.1, the mixing time satisfies 

r > (4e- n/24 )" 1 - 1/2 > e" /24 /4 - 1/2. 

□ 
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In fact, this proof can be extended to the more general Markov chain where we can swap any 1 
with any 0, as long as we maintain the correct stationary distribution. This is easy to see, because 
any move that swaps a single 1 with a single can only change the maximum height by at most 
2 (see Figure [6]) . If we expand S2 to include all configurations with maximum height n + M or 
n + M + 1, vr(S , 2) is still exponentially smaller than ir{S\) and ir{S^). Hence the Markov chain over 
permutations that can make arbitrary transpositions can still take exponential time to converge. 



Figure 6: A move that swaps an arbitrary (1,0) pair. 



7 Analyzing a Product of Markov chains 

For each of our positive results, we showed that the Markov chain in question can be decomposed 
into M independent Markov chains. Since each Markov chain Mi operates independently, the 
overall mixing time will be roughly M times the mixing time of each piece, slowed down by the 
inverse probability of selecting that process. Similar results have been proved before (e.g., see [21 [3]) 
in other settings. We include the proof for completeness. 

Theorem 7.1. Suppose the Markov chain M is a product of M independent Markov chains 
■Mi, M2, . . . , Mm, where M updates Mi with probability pi, where YliPi = V T *( e ) ^ s 
mixing time for Mi and Tj(e) > 41ne for each i, then 

r(e) < max — T; ( —tt) ■ 
y ' ~ i=l,2,...,M pi \2M ) 

Proof. Suppose the Markov chain M has transition matrix P, and each Mi has transition matrix 
Pi and state space fij. Let Bi = piPi + (1 — Pi)I, where I is the identity matrix of the same size 
as Pi, be the transition matrix of Mi, slowed down by the probability pi of selecting Mi. First we 
show that the total variation distance satisfies 

l + 2d tv (P t ,7T) < H(l + 2d tv (Bj,7Ti)). 

i 

To show this, notice that for x = (x%, x 2 , . . ■ , x M ),y = (2/1, 2/2, — , Vm) G P\x, y) = Wi B\{xi, yi). 
Let €i(xi,yi) = Bj(xi,yi) - 71* (^) and for any Xi G ft*, 

e ii x i) = ^2 \ e i{ x iiVi)\ < 2d tv{B\,Tti). 

Then, 
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dt„(P*,7r) = max \ V \P\x,y) - ir(y)\ 
xen z * — * 



yen 



max - 
xen 2 



yen 



yen 



s ^e n ( e i( x iiVi) + - 



yen 



max 

xen 2 



yen 



SC[M],S^$i€S i<£S 

-™^e e ni^^'^^ni^^)! 

5c[M],s^0 jes^e^ i<^Sy l en l 

SC[M],S^0i€S i£S 

= maxJ[](l + e J (x l ))-l/2 < 1 JJ(1 + 2(^(5?,^)) - 1/2, 



sen 2 

as desired. Thus in order to get dt v {P t ,^) < e, it suffices to show dt v (Bj,Hi) < e/(2M) for each 
i, because then 

l + 2d tv (P t ,7T)<H(l + 2d tv (B t i ,Tr i )) 

i 

<IJ(l + 2e/(2M)) 

i 

<e € < 1 + 2e. 
Hence it suffices to show dt v (Bj,7Ti) < e/(2M) for each i 
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Since B\ = foP, + (1 - Pi )lf = £$ =0 " ViT^U, we have 

d tv {B\^i) = max - ^ |P-(x;,yi) - 7Tj(?/j)| 



max 



i=o 



max - 



E(*)^ (1 " R)t_i ^ (Xi ' w 

J"=0 VJ/ 

^ m c a <? 9 EE (T)^ 1 -Pif~ j p H x ^y 



eQ^ 1 -^ 



j=0 



3=0 

t 



3=0 

Let U = Tj(e/(4M)). Now, for j > U = r;(e/(4M)), we have that d to (P/,7Tj) < e/(4M). For all j, 
we have di„(P/, 7Tj) < 2, so if X is a binomial random variable with parameters t and pi, we have 



i=0 

= E -ft)*-^^',^) + E (*.)p{(i - ft)*-^^',^) 

j=0 VJ/ j=U VJ/ 

< 2 E ( + S -ft)*-W(2A0 

= 2P(X < U) + e/(2M). 

By Chernoff bounds, P(X < (1 - 5)t Pi ) < e ~ tlH&2 / 2 . Setting 5=1- ti/(t Pi ), then for all t > 2ti/ Pi 
S 2 > 1/4 and we have 

P(X < U) < e' tp ' 52/2 < e- tp ^ 8 < e/(8M), 
as long as t > 8\n(e/(8M))/ Pi . Therefore for t > max{8 \n(e/(8M))/ Pi , 2ti/ Pi }, 

d tv (Bj,ir i ) = 2P(X <tO + e/(4M) 

< 2e/(8M) + e/(4M) = e/(2M). 

Hence by time t the total variation distance satisfies dt v (P t , vr) < e. 



□ 
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